This notebook features an introduction to PixieDust, the Python library that makes data visualization easy.
This notebook is pretty simple and self-explanatory, but it wouldn't hurt to load up the PixieDust documentation so you have it.
New to notebooks? Don't worry, all you need to know to use this notebook is that to run code cells, put your cursor in the cell and press Shift + Enter.
In [1]:
# Make sure you have the latest version of PixieDust installed on your system
# Only run this cell if you did _not_ install PixieDust from source
# To confirm you have the latest, uncomment the next line and run this cell
#!pip install --user --upgrade pixiedust
Now that you have PixieDust installed and up-to-date on your system, you need to import it into this notebook. This is the last dependency before you can play with PixieDust.
In [1]:
# Run this cell
import pixiedust
Once you see the success message output from running import pixiedust
, you're all set.
In [3]:
# Run this cell to
# a) build a SQL context for a Spark dataframe
sqlContext=SQLContext(sc)
# b) create Spark dataframe, and assign it to a variable
df = sqlContext.createDataFrame(
[("Green", 75),
("Blue", 25)],
["Colors","%"])
The data in the variable we just created is ready to be displayed, without any code other than the call to display()
.
In [3]:
# Run this cell to display the dataframe above as a pie chart
display(df)
After running the cell above, you should have seen a Spark dataframe displayed as a pie chart, along with some controls to tweak the display. All that came from passing the dataframe variable to display()
.
In the next cell, we'll pass more interesting data to display()
, which will also offer more advanced controls.
In [4]:
# create another dataframe, in a new variable
df2 = sqlContext.createDataFrame(
[(2010, 'Camping Equipment', 3),
(2010, 'Golf Equipment', 1),
(2010, 'Mountaineering Equipment', 1),
(2010, 'Outdoor Protection', 2),
(2010, 'Personal Accessories', 2),
(2011, 'Camping Equipment', 4),
(2011, 'Golf Equipment', 5),
(2011, 'Mountaineering Equipment',2),
(2011, 'Outdoor Protection', 4),
(2011, 'Personal Accessories', 2),
(2012, 'Camping Equipment', 5),
(2012, 'Golf Equipment', 5),
(2012, 'Mountaineering Equipment', 3),
(2012, 'Outdoor Protection', 5),
(2012, 'Personal Accessories', 3),
(2013, 'Camping Equipment', 8),
(2013, 'Golf Equipment', 5),
(2013, 'Mountaineering Equipment', 3),
(2013, 'Outdoor Protection', 8),
(2013, 'Personal Accessories', 4)],
["year","category","unique_customers"])
# This time, we've combined the dataframe and display() call in the same cell
# Run this cell
display(df2)
This chart like the first one is rendered by matplotlib. With PixieDust, you have other options. To toggle between renderers, use the Renderers
control at top right of the display output:
Options
button to explore other display configurations; e.g., clusteringTo know more : https://pixiedust.github.io/pixiedust/displayapi.html
In [5]:
# load a CSV with pixiedust.sampledata()
df3 = pixiedust.sampleData("https://github.com/ibm-watson-data-lab/open-data/raw/master/cars/cars.csv")
display(df3)
You should see a scatterplot above, rendered again by matplotlib. Look at the Renderer
menu at top right. You should see options for Bokeh and now, Seaborn. If you don't see Seaborn, it's not installed on your system. No problem, just install it by running the next cell.
In [28]:
# To install Seaborn, uncomment the next line, and then run this cell
#!pip install --user seaborn
If you installed Seaborn, you'll need to also restart your notebook kernel, and run the cell to import pixiedust
again. Find Restart in the Kernel menu above.